Los datos consisten en observaciones mensuales de cuadal en 74 estaciones: las fechas de observación van desde enero de 1969 hasta diciembre de 1979. Cada estación representa una cuenca hidrológica en Centro América, y en total se tienen observaciones de 12 meses en cada uno de los 11 años para 74 locaciones. También, se tienen datos de climatologÃa mensual para cada una de las estaciones.
## [1] 132 74
## [1] 12 74
Las locaciones de las estaciones se pueden apreciar en el siguiente mapa:
## [1] 74 4
## left bottom right top
## -112.1223 5.6804 -75.2067 31.9196
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=18.8,-93.6645&zoom=5&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Como las locaciones incluyen varias estaciones en el Norte de México, se realiza un corte en la latitud 20 y en la longitud -100, para obtener 41 estaciones localizadas en Centroamérica:
## [1] 41 4
## left bottom right top
## -101.7405 6.7175 -76.1505 20.5115
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=13.6145,-88.9455&zoom=5&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
caudalt <- caudal %>% gather(estacion,X1:X74, -Año, -Mes, -Cantidad.de.dÃas.del.mes)
names(caudalt) <- c("anio","mes","ndiasm","estacion","escorre")
caudal2 <- caudal[,-c(1:3)][,estaciones]
clima2 <- (as.matrix(clima[,-1]) %x% rep(1, 11))[,estaciones]
anomalies <- caudal2-clima2
names(anomalies) <- names(caudal2) <- paste0("E",estaciones)
boxplot(caudal2, ylab="EscorrentÃa")
points(apply(clima2,2,mean), col="red")
boxplot(anomalies, ylab="AnomalÃas")
caudalt %>% group_by(estacion) %>% summarize (
N = n(),
Escorrentia_Promedio = mean(escorre),
Escorrentia_StDev = sd(escorre)) %>%
kable() %>%
kable_styling()
| estacion | N | Escorrentia_Promedio | Escorrentia_StDev |
|---|---|---|---|
| X1 | 132 | 0.1012712 | 0.2083986 |
| X10 | 132 | 0.5359818 | 0.6842002 |
| X11 | 132 | 0.1801530 | 0.1598324 |
| X12 | 132 | 0.3961697 | 0.6546592 |
| X13 | 132 | 0.6569144 | 0.8619023 |
| X14 | 132 | 0.5028538 | 0.6769622 |
| X15 | 132 | 0.6280818 | 0.8109360 |
| X16 | 132 | 1.2294447 | 1.8718943 |
| X17 | 132 | 0.7177409 | 1.0264006 |
| X18 | 132 | 0.2875326 | 0.4014419 |
| X19 | 132 | 0.1959530 | 0.2417112 |
| X2 | 132 | 0.0768152 | 0.2820430 |
| X20 | 132 | 0.1396008 | 0.2482415 |
| X21 | 132 | 0.4173152 | 0.5331032 |
| X22 | 132 | 0.3177515 | 0.4479770 |
| X23 | 132 | 0.6585303 | 0.7207204 |
| X24 | 132 | 0.8828424 | 1.3499088 |
| X25 | 132 | 1.6085735 | 1.9102860 |
| X26 | 132 | 3.0750530 | 3.1324255 |
| X27 | 132 | 0.8613576 | 0.9437757 |
| X28 | 132 | 0.3941220 | 0.6453538 |
| X29 | 132 | 0.0071576 | 0.0024426 |
| X3 | 132 | 0.0364856 | 0.0504310 |
| X30 | 132 | 0.0668485 | 0.1353348 |
| X31 | 132 | 0.1991871 | 0.4172669 |
| X32 | 132 | 0.7312091 | 1.1933462 |
| X33 | 132 | 0.6985523 | 0.7826447 |
| X34 | 132 | 2.3818848 | 3.3956285 |
| X35 | 132 | 2.1651992 | 3.1032732 |
| X36 | 132 | 1.2866598 | 1.1801042 |
| X37 | 132 | 2.4169439 | 2.8947287 |
| X38 | 132 | 3.2640591 | 2.0492120 |
| X39 | 132 | 2.1885689 | 2.2773171 |
| X4 | 132 | 0.1417864 | 0.2504622 |
| X40 | 132 | 2.7165485 | 1.7787828 |
| X41 | 132 | 4.2308894 | 4.3783027 |
| X42 | 132 | 0.3406689 | 0.3045772 |
| X43 | 132 | 0.4580795 | 0.8978680 |
| X44 | 132 | 1.6686795 | 1.6261041 |
| X45 | 132 | 6.8344114 | 6.4701808 |
| X46 | 132 | 3.4932129 | 2.2369205 |
| X47 | 132 | 6.4359932 | 4.8595835 |
| X48 | 132 | 1.9258288 | 1.8118980 |
| X49 | 132 | 1.0767970 | 1.3398358 |
| X5 | 132 | 0.0059561 | 0.0147539 |
| X50 | 132 | 1.4034295 | 1.5095322 |
| X51 | 132 | 0.9393735 | 1.0632060 |
| X52 | 132 | 0.7170348 | 1.7510073 |
| X53 | 132 | 0.5813038 | 0.7158349 |
| X54 | 132 | 0.9148394 | 0.9314110 |
| X55 | 132 | 0.9505023 | 2.0128306 |
| X56 | 132 | 0.9982326 | 2.0896191 |
| X57 | 132 | 0.3858250 | 0.7894764 |
| X58 | 132 | 0.4703621 | 0.7814667 |
| X59 | 132 | 2.4603455 | 2.6456829 |
| X6 | 132 | 0.0105803 | 0.0162495 |
| X60 | 132 | 1.6487795 | 1.6659602 |
| X61 | 132 | 1.2504667 | 0.8378243 |
| X62 | 132 | 12.6000409 | 10.6346524 |
| X63 | 132 | 4.2249205 | 2.8528697 |
| X64 | 132 | 6.1103144 | 4.9578727 |
| X65 | 132 | 6.9995644 | 3.6296689 |
| X66 | 132 | 7.3712205 | 4.8715207 |
| X67 | 132 | 10.2715386 | 7.1623054 |
| X68 | 132 | 7.3007818 | 6.0750662 |
| X69 | 132 | 6.4036182 | 5.7319989 |
| X7 | 132 | 0.0656902 | 0.1801727 |
| X70 | 132 | 5.6396955 | 4.5012564 |
| X71 | 132 | 2.8597848 | 3.0331741 |
| X72 | 132 | 3.5003023 | 3.1059296 |
| X73 | 132 | 2.8758371 | 2.6404626 |
| X74 | 132 | 10.6376697 | 7.9539380 |
| X8 | 132 | 0.3802273 | 0.5320165 |
| X9 | 132 | 0.4202583 | 0.5474960 |
Primero, se deben calcular los PCA utilizando las anomalÃas en lugar de las observaciones. Como se tiene la climatologÃa mensual para cada locación, el cálculo consiste en restar la climatologÃa mensual a cada observación, según el mes correspondiente. Luego, se procede a hacer el PCA y por último agrupar las estaciones utilizando k-means de los primeros 10 componentes.
clima2 <- (as.matrix(clima[,-1]) %x% rep(1, 11))[,estaciones]
anomalies <- caudal2-clima2
eofNum(anomalies)
eofPlot(eof(anomalies, n = 6), type="coef")
#eofPlot(eof(anomalies, n = 6), type="amp")
mydata <- (eof(anomalies, n = 6)$REOF[,1:6])
fit <- kmeans(mydata, 6)
aggregate(mydata,by=list(fit$cluster),FUN=mean)
## Group.1 EOF1 EOF2 EOF3 EOF4 EOF5
## 1 1 -0.22715514 -0.8089087 -0.1243321 -0.1481689 -0.035708394
## 2 2 -0.33017071 -0.1520441 -0.2112088 -0.7914923 0.083745489
## 3 3 -0.43127070 -0.1328040 -0.6748793 -0.3457351 0.099499609
## 4 4 0.05051849 -0.1236158 0.1395286 0.1929107 -0.906821357
## 5 5 -0.30648031 -0.2367611 -0.3918980 -0.7134285 0.037176395
## 6 6 -0.68188720 -0.4052669 -0.3042273 -0.3533304 0.006943709
## EOF6
## 1 -0.009155738
## 2 0.135560898
## 3 -0.047684995
## 4 -0.055181075
## 5 -0.174910613
## 6 -0.004285151
loca <- data.frame(loca, cluster=fit$cluster)
ggmap(sq_map) + geom_point(data = loca, mapping = aes(x = Longitud, y = Latitud, colour=as.character(cluster)))
Descripción de los clusters:
medianas <- apply(caudal2,2,median)
data <- as_tibble(cbind(loca,medianas))
names(data) <- c("cod","lat","lon","area","rank_var","Mediana_de_cluster")
## Area promedio y mediana de cada cluster:
kable(data %>% group_by(Cluster=rank_var) %>% summarize(Area_prom =mean(area), Mediana_Esc = median(Mediana_de_cluster)),digits=2)
| Cluster | Area_prom | Mediana_Esc |
|---|---|---|
| 1 | 712.00 | 6.66 |
| 2 | 8964.25 | 1.00 |
| 3 | 2211.14 | 0.54 |
| 4 | 13164.00 | 0.01 |
| 5 | 11574.00 | 1.26 |
| 6 | 1700.93 | 4.28 |
matplot(caudal2,type="l", col=data$rank_var, ylab="EscorrentÃa",xaxt='n', main="EscorrentÃa por mes según número de cluster")
axis(1, at = seq(1,132 , by = 6), las=2, labels=paste0(caudal$Mes,"-",caudal$Año)[seq(1,132 , by = 6)])
legend("topright", levels(as.factor(data$rank_var)),col=1:6,cex=0.8,fill=1:6)
matplot(anomalies,type="l", col=data$rank_var, ylab="AnomalÃas",xaxt='n', main="AnomalÃas por mes según número de cluster")
axis(1, at = seq(1,132 , by = 6), las=2, labels=paste0(caudal$Mes,"-",caudal$Año)[seq(1,132 , by = 6)])
legend("topright", levels(as.factor(data$rank_var)),col=1:6,cex=0.8,fill=1:6)